An annotation scheme for complex disfluencies

نویسندگان

  • Peter A. Heeman
  • Andy McMillin
  • J. Scott Yaruss
چکیده

In this paper, we present an annotation scheme for disfluencies. Unlike previous schemes, this scheme allows complex disfluencies with multiple backtracking points to be annotated, which are common in stuttered speech. The scheme specifies each disfluency in terms of word-level annotations, thus making the scheme useful for building sophisticated language models of disfluencies. As determining the annotation codes is quite difficult, we have developed a pen and paper procedure in which the annotator lines up the words into rows and columns, from which it is straight-forward for the annotator to determine the annotation tags.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intercoder reliability in annotating complex disfluencies

In previous work, we presented an annotation scheme that can describe complex disfluencies. In this paper, we first show the prevalence of complex disfluencies and illustrate the types of distinctions that our scheme allows. Second, we present an annotation tool that allows the scheme to be easily applied. Third, we present the results of a reliability study in annotating complex disfluencies w...

متن کامل

Automatic detection and annotation of disfluencies in spoken French corpora

In this paper we propose a multi-step system for the semiautomatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We pres...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Annotation and analysis of overlapping speech in political interviews

Looking for a better understanding of spontaneous speech-related phenomena and to improve automatic speech recognition (ASR), we present here a study on the relationship between the occurrence of overlapping speech segments and disfluencies (filled pauses, repetitions, revisions) in political interviews. First we present our data, and our overlap annotation scheme. We detail our choice of overl...

متن کامل

Extending and Scaling up the Chinese Treebank Annotation

We discuss on-going efforts to scale up the Chinese Treebank annotation and extending Chinese treebanking to informal genres like conversational speech, news groups and weblogs, as well as discussion forums. The original Chinese Treebank annotation scheme was designed for formal genres such as newswire and magazine articles, where the language is very formal and each document is carefully edite...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006